Georeferencing Semi-Structured Place-Based Web Resources Using Machine Learning

نویسندگان

چکیده مقاله:

In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georeference the semi-structured resources on the web using machine learning. To this end, we leveraged the advertisements related to real state domain in the city of Tehran, Iran, published in Divar website. In order to extract the advertisesments from the website, a crawling approach was chosen. In addition, to assign coordinates to advertisements, we used Random Forests algorithm. The results show that using this approach, the advertisements can be georeferenced at the precision of neighborhoods. The resulting presicion from this approach is about 2 km and 6 km in latitude and longitude directions, respectively. Moreover, the results demonstrate that price of the property has higher importance relative to other variables considered in this study. It can be concluded that the price of properties in Tehran shows stronger spatial pattern in North-South direction than East-West direction.  

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trusting Semi-structured Web Data

The growth of the Web brings an uncountable amount of useful information to everybody who can access it. These data are often crowdsourced or provided by heterogenous or unknown sources, therefore they might be maliciously manipulated or unreliable. Moreover, because of their amount it is often impossible to extensively check them, and this gives rise to massive and ever growing trust issues. T...

متن کامل

Browsing Semi-structured Web Texts Using Formal Concept Analysis

Query-directed browsing of unstructured Web-texts using Formal Concept Analysis (FCA) confronts two problems. Firstly on-line Web-data is sometimes unstructured and any FCA-system must include additional mechanisms to structure input sources. Secondly many online collections are large and dynamic so a Web-robot must be used to automatically extract data. These issues are addressed in this paper...

متن کامل

Web Entities Extraction Based on Semi-Structured Semantic Database

Web is the biggest source of information and contains many entities and relationships between them, extracting these data from Massive Web pages and Integrating to a Semi-Structured Data with rich semantics will be more conducive to the management and use of these web data. On this premise, a comprehensive method is proposed to perform extraction the entities and relationships from the webpages...

متن کامل

Intelligent Web Caching Using Machine Learning Methods

Web caching is a technology to improve network traffic on the Internet. It is a temporary storage of Web objects for later retrieval. Three significant advantages of Web caching include reduction in bandwidth consumption, server load, and latency. These advantages make the Web to be less expensive yet it provides better performance. This research aims to introduce an advanced machine learning m...

متن کامل

Georeferencing Flickr resources based on textual meta-data

The task of automatically estimating the location of web resources is of central importance in location-based services on the Web. Much attention has been focused on Flickr photos and videos, for which it was found that language modeling approaches are particularly suitable. In particular, state-of-the art systems for georeferencing Flickr photos tend to cluster the locations on Earth in a rela...

متن کامل

Semi-automated web resource discovery and analysis: An approach based on interactive machine learning principles

This report presents a semi-automated approach to web Resource Discovery based on an architecture in which both human learning and machine learning is integrated in the same model. The fundamental idea of the semi-automated approach is to let the machine automatically identify semantic ambiguities and ask a human analyst to resolve them. An application prototype is developed that demonstrates a...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 10  شماره 2

صفحات  119- 129

تاریخ انتشار 2020-12

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023